This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(readxl)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(tseries)
library(TTR)
library(ggplot2)
library(tidyr)
sales_data <- read.csv("/Users/kiannazem/Downloads/TOTALSA.csv")
sales_data$Date <- as.Date(sales_data$Date, format = "%m/%d/%Y")
sales_ts <- ts(sales_data$Sales.Units.in.Millions., start = c(2019, 1), frequency = 12)
print(head(sales_data))
## Date Sales.Units.in.Millions.
## 1 2019-01-01 16.970
## 2 2019-02-01 16.962
## 3 2019-03-01 17.842
## 4 2019-04-01 16.968
## 5 2019-05-01 17.967
## 6 2019-06-01 17.781
plot(sales_ts, main = "Time Series Plot of Sales", xlab = "Time", ylab = "Sales", type = "o")
summary_stats <- summary(sales_ts)
print(summary_stats)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.944 14.189 15.938 15.612 16.966 18.697
boxplot(sales_ts, main = "Boxplot of Sales", ylab = "Sales")
The time series plot shows an overall trend with fluctuations over time,
beginning with relatively stable sales levels from 2019 through early
2020, followed by a sharp decline around early 2020, likely due to
external factors such as the COVID-19 pandemic. After this drop, there
is a noticeable recovery through late 2020 and 2021, with sales
gradually returning to pre-2020 levels. Starting in 2022, the series
becomes more stable, with a clear upward trend and smaller fluctuations,
suggesting a consistent seasonal pattern emerging. This stability,
particularly in the post-2022 data, makes it well-suited for
forecasting, as it reflects the current dynamics of the series without
the disruptions seen in earlier years. While seasonality is present, it
is not overly pronounced but becomes more evident in recent years,
especially in the upward movement observed toward the end of 2023.
The summary statistics and box plot for the full dataset show that the minimum sales value is 8.944 million, which corresponds to the sharp drop observed in 2020, likely due to external disruptions such as the COVID-19 pandemic. The maximum value is 18.697 million, reflecting peak sales performance during the recovery period. The mean sales value is 15.612 million, slightly lower than the median of 15.938 million, indicating a near-symmetric distribution with a slight left skew. Q1 is 14.189 million, and Q3 is 16.966 million, resulting in an IQR of approximately 2.777 million, showing that the data is relatively tightly clustered around the central values. The box plot highlights a single outlier below the lower whisker, corresponding to the sharp decline during 2020, while the majority of the sales values fall between Q1 and Q3. These insights suggest that while the data exhibits stability overall, the sharp decline during 2020 creates a notable deviation that may justify evaluating whether excluding earlier data could improve the forecasting models. Due to this, it would be appropriate to cut out all data prior to 2022.
sales_data <- subset(sales_data, Date >= as.Date("2022-01-01"))
sales_ts <- ts(sales_data$Sales.Units.in.Millions., start = c(2022, 1), frequency = 12)
print(head(sales_data))
## Date Sales.Units.in.Millions.
## 37 2022-01-01 14.866
## 38 2022-02-01 14.168
## 39 2022-03-01 14.253
## 40 2022-04-01 14.681
## 41 2022-05-01 13.286
## 42 2022-06-01 13.669
plot(sales_ts, main = "Time Series Plot of Sales", xlab = "Time", ylab = "Sales", type = "o")
summary_stats <- summary(sales_ts)
print("Summary Statistics:")
## [1] "Summary Statistics:"
print(summary_stats)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.29 14.19 15.47 15.18 15.95 16.45
boxplot(sales_ts, main = "Boxplot of Sales", ylab = "Sales")
decomposition <- decompose(sales_ts)
print(decomposition$type)
## [1] "additive"
decomp <- decompose(sales_ts)
plot(decomp)
seasonal_indices <- decomp$seasonal
print(seasonal_indices)
## Jan Feb Mar Apr May Jun
## 2022 0.37197569 -0.03735764 0.03405903 0.69685069 0.11885069 0.44622569
## 2023 0.37197569 -0.03735764 0.03405903 0.69685069 0.11885069 0.44622569
## 2024 0.37197569 -0.03735764
## Jul Aug Sep Oct Nov Dec
## 2022 -0.12229514 -0.27592014 -0.39777431 0.42472569 -0.11773264 -1.14160764
## 2023 -0.12229514 -0.27592014 -0.39777431 0.42472569 -0.11773264 -1.14160764
## 2024
adjusted_ts <- sales_ts - decomp$seasonal
plot(adjusted_ts, main = "Seasonally Adjusted Time Series", type = "o", col = "red")
lines(sales_ts, col = "blue", lty = 2)
legend("topright", legend = c("Adjusted", "Actual"), col = c("red", "blue"), lty = c(1, 2))
This decompostion is additive. The time series is also seasonal, as
indicated by the decomposition and the monthly indices, which exhibit
consistent periodic fluctuations. Based on the indices, the highest
seasonal values occur in April (0.697) and June (0.446), suggesting that
sales are typically stronger during these months, possibly due to
increased demand tied to seasonal factors such as product usage trends
in spring and early summer. Conversely, the lowest seasonal value is in
December (-1.142), indicating a significant drop in sales during this
time, which could be attributed to the end-of-year slowdown or competing
consumer priorities such as holiday-related spending. The indices also
show moderate dips in July (-0.123) and August (-0.276), suggesting a
seasonal decrease during late summer. These patterns demonstrate a clear
and recurring seasonal component, which should be accounted for in
forecasting models to improve accuracy.
However, Seasonality does not appear to have large fluctuations in the value of the time series. In the seasonally adjusted plot, the red line representing the adjusted series is much smoother compared to the actual series which is the blue dashed line.
naive_model <- naive(sales_ts, h=12)
plot(naive_model, main = "Naive Forecast")
print(naive_model)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Mar 2024 16.191 15.35813 17.02387 14.91723 17.46477
## Apr 2024 16.191 15.01314 17.36886 14.38962 17.99238
## May 2024 16.191 14.74842 17.63358 13.98477 18.39723
## Jun 2024 16.191 14.52526 17.85674 13.64346 18.73854
## Jul 2024 16.191 14.32864 18.05336 13.34277 19.03923
## Aug 2024 16.191 14.15089 18.23111 13.07092 19.31108
## Sep 2024 16.191 13.98743 18.39457 12.82093 19.56107
## Oct 2024 16.191 13.83528 18.54672 12.58824 19.79376
## Nov 2024 16.191 13.69238 18.68962 12.36969 20.01231
## Dec 2024 16.191 13.55723 18.82477 12.16299 20.21901
## Jan 2025 16.191 13.42867 18.95333 11.96639 20.41561
## Feb 2025 16.191 13.30585 19.07615 11.77854 20.60346
residuals_naive <- residuals(naive_model)
plot(residuals_naive, main = "Residuals", type = "o")
hist(residuals_naive, main = "Histogram of Residuals")
plot(fitted(naive_model), residuals_naive, main = "Fitted Values vs Residuals", xlab = "Fitted", ylab = "Residuals")
residuals_naive <- residuals_naive[!is.na(residuals_naive)]
acf(residuals_naive)
accuracy_naive <- accuracy(naive_model)
print(accuracy_naive)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.053 0.6498938 0.51412 0.2460432 3.411387 0.3230122 -0.4356982
The residuals plot shows the variation between the actual and predicted values over time, and the residuals appear to fluctuate randomly around zero without any obvious pattern. This indicates that the Naive model does not leave significant trends or seasonality in the residuals, which is a good sign. However, some spikes suggest occasional overestimation or underestimation. The histogram of the residuals suggests that the residuals are approximately normally distributed, as the frequencies peak around zero and taper off symmetrically, though there is some skewness at the tails, indicating minor deviations from normality.
The plot of fitted values versus residuals shows that the residuals are scattered without any systematic pattern, which is desirable, as it confirms that the Naive model does not suffer from issues like heteroscedasticity or model misfit. Similarly, the plot of actual values versus residuals also displays random scattering, indicating that the residuals are not correlated with the actual values and confirming the model’s adequacy in capturing the central tendency of the data. Lastly, the ACF plot of residuals indicates that while most lags fall within the confidence bounds, there are a few significant spikes, particularly at lag 1, suggesting minor autocorrelation. This implies that while the Naive model is relatively effective, there may still be room for improvement in capturing the full dynamics of the time series.
The accuracy of the naive model, based solely on RMSE, is 0.65, which indicates a moderate level of prediction error. This suggests that while the model provides a reasonable starting point, it does not fully capture the variability in the data. For the next year, the model predicts a constant value of 16.191 for each month, which aligns with the naive approach of assuming no change from the most recent observed value. The confidence intervals widen over time, indicating increasing uncertainty, with 80% bounds starting at 15.36 to 17.02 in March 2024 and expanding to 13.31 to 19.08 by February 2025. A key observation is that the naive model’s simplicity makes it a useful baseline for comparison, but it does not account for underlying patterns like seasonality or trends, resulting in flat forecasts that may not align with actual future values.
plot(sales_ts, main = "Time Series with Moving Averages", xlab = "Time", ylab = "Sales", type = "o")
ma3 <- ma(sales_ts, order = 3)
ma6 <- ma(sales_ts, order = 6)
ma9 <- ma(sales_ts, order = 9)
lines(ma3, col = "red", lwd = 2)
lines(ma6, col = "blue", lwd = 2)
lines(ma9, col = "green", lwd = 2)
legend("topright", legend = c("Original", "MA (3)", "MA (6)", "MA (9)"),
col = c("black", "red", "blue", "green"), lty = 1, lwd = 2)
As the moving average order increases, the plot shows that the series
becomes progressively smoother. The MA(3) line (red) follows the
original data more closely, capturing short-term fluctuations while
still reducing noise. The MA(6) line (blue) further smooths out the
variations, emphasizing the underlying trend while diminishing smaller
fluctuations. Finally, the MA(9) line (green) appears the smoothest,
focusing almost entirely on the overall trend while ignoring most
short-term changes.
This progression illustrates how higher-order moving averages prioritize long-term trends over short-term volatility, making them more suitable for identifying underlying patterns but less effective at responding to recent changes or seasonality in the data. However, this smoothing comes at the cost of losing finer details, which might be critical for short-term forecasting or understanding seasonal dynamics.
ets_model <- ets(sales_ts)
print(summary(ets_model))
## ETS(A,N,N)
##
## Call:
## ets(y = sales_ts)
##
## Smoothing parameters:
## alpha = 0.5558
##
## Initial states:
## l = 14.5659
##
## sigma: 0.5893
##
## AIC AICc BIC
## 61.13388 62.22479 64.90817
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.1008798 0.566215 0.4348502 0.5553676 2.889801 0.2732084
## ACF1
## Training set -0.1227497
ets_forecast <- forecast(ets_model, h = 12)
plot(ets_forecast, main = "Simple Smoothing Forecast (ETS)", xlab = "Time", ylab = "Sales")
print(ets_forecast)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Mar 2024 16.02361 15.26835 16.77887 14.86854 17.17869
## Apr 2024 16.02361 15.15954 16.88769 14.70212 17.34510
## May 2024 16.02361 15.06297 16.98425 14.55444 17.49278
## Jun 2024 16.02361 14.97527 17.07196 14.42030 17.62692
## Jul 2024 16.02361 14.89435 17.15287 14.29655 17.75067
## Aug 2024 16.02361 14.81886 17.22837 14.18110 17.86612
## Sep 2024 16.02361 14.74782 17.29940 14.07246 17.97476
## Oct 2024 16.02361 14.68054 17.36668 13.96956 18.07766
## Nov 2024 16.02361 14.61647 17.43075 13.87158 18.17564
## Dec 2024 16.02361 14.55520 17.49203 13.77786 18.26936
## Jan 2025 16.02361 14.49638 17.55085 13.68791 18.35931
## Feb 2025 16.02361 14.43974 17.60748 13.60129 18.44593
residuals_ets <- residuals(ets_model)
plot(residuals_ets, main = "Residuals (ETS Model)", type = "o")
hist(residuals_ets, main = "Histogram of Residuals (ETS Model)")
plot(fitted(ets_model), residuals_ets, main = "Fitted Values vs Residuals (ETS)", xlab = "Fitted", ylab = "Residuals")
acf(residuals_ets, main = "ACF of Residuals (ETS)")
accuracy_ets <- accuracy(ets_forecast)
print(accuracy_ets)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.1008798 0.566215 0.4348502 0.5553676 2.889801 0.2732084
## ACF1
## Training set -0.1227497
The ETS model forecast for the next 12 months uses a simple exponential smoothing approach with additive error. The smoothing parameter, alpha, is 0.5558, indicating moderate weight on recent observations. The initial state of the level component is 14.5659, representing the starting point of the smoothed series. The sigma value, 0.5893, signifies the standard deviation of the residuals, reflecting the model’s inherent variability.
Residual analysis reveals valuable insights into the model’s performance. The residual plot shows variability around zero, suggesting no significant bias, but there are slight spikes indicating room for improvement in capturing certain patterns. The histogram of residuals highlights a roughly symmetric distribution centered near zero, supporting the assumption of normally distributed errors. The fitted values versus residuals plot does not show a systematic pattern, reinforcing the model’s adequacy in capturing the data structure. However, slight clustering indicates some residual dependencies. The actual values versus residuals plot also suggests no significant patterns, and the ACF plot of residuals reveals no significant autocorrelations except for a slight spike at lag 1, which could hint at minor residual dependency.
The RMSE for the ETS model is 0.566215, indicating reasonably accurate forecasts. The forecasted value for each of the next 12 months is approximately 16.02361, with the prediction intervals narrowing the focus to a range between approximately 14.43974 and 17.60748 (95% confidence). This consistent forecast suggests that the model expects stable performance without significant seasonal or trend-driven changes in the time series.
Overall, the ETS model performs well in terms of accuracy, as evidenced by the low RMSE. The consistent forecast values reflect the absence of strong trends or seasonality, aligning with the data’s structure. The residual analysis reinforces the model’s validity, though minor dependencies in residuals suggest opportunities for refinement in future iterations. The forecast provides a reliable projection for the time series, indicating stability in the underlying process.
hw_model <- HoltWinters(sales_ts)
plot(hw_model, main = "Holt-Winters Model")
print(paste("Alpha:", hw_model$alpha))
## [1] "Alpha: 0.393071548742986"
print(paste("Beta:", hw_model$beta))
## [1] "Beta: 0"
print(paste("Gamma:", hw_model$gamma))
## [1] "Gamma: 0"
forecast_hw <- forecast(hw_model, h = 12)
plot(forecast_hw, main = "Holt-Winters Forecast", xlab = "Time", ylab = "Sales")
accuracy_hw <- accuracy(forecast_hw)
print(accuracy_hw)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.005262362 0.8045032 0.5856619 0.004108308 3.699873 0.3679606
## ACF1
## Training set -0.07687174
residuals_hw <- residuals(hw_model)
plot(residuals_hw, main = "Residuals (Holt-Winters Model)", type = "o")
hist(residuals_hw, main = "Histogram of Residuals (Holt-Winters Model)")
plot(fitted(hw_model), residuals_hw, main = "Fitted Values vs Residuals (Holt-Winters)", xlab = "Fitted", ylab = "Residuals")
acf(residuals_hw, main = "ACF of Residuals (Holt-Winters)")
print(forecast_hw)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Mar 2024 16.55158 15.48167 17.62149 14.91530 18.18787
## Apr 2024 17.37588 16.22629 18.52548 15.61773 19.13403
## May 2024 16.95939 15.73528 18.18349 15.08728 18.83149
## Jun 2024 17.44827 16.15394 18.74260 15.46876 19.42778
## Jul 2024 16.69453 15.33359 18.05547 14.61315 18.77591
## Aug 2024 16.96700 15.54256 18.39143 14.78851 19.14548
## Sep 2024 17.08880 15.60357 18.57402 14.81734 19.36025
## Oct 2024 18.07280 16.52919 19.61642 15.71205 20.43356
## Nov 2024 17.69185 16.09197 19.29173 15.24505 20.13865
## Dec 2024 16.82949 15.17526 18.48371 14.29956 19.35941
## Jan 2025 18.50458 16.79773 20.21143 15.89418 21.11498
## Feb 2025 18.25675 16.49885 20.01465 15.56828 20.94522
The alpha is 0.393, indicating the weight given to the most recent observations in updating the level. Beta is 0, meaning no trend adjustment is being applied, and gamma is 0, suggesting that seasonality is not being modeled explicitly. The initial state for the level is 15.87, representing the starting estimate for the time series level, while the trend and seasonality are both effectively 0, further confirming no explicit trend or seasonal component.
The sigma value is 0.804, which represents the standard deviation of the residuals. This gives an indication of the variability or uncertainty in the forecast.
For residual analysis, the residuals plot shows scattered values with no recognizable pattern, indicating that the residuals are reasonably random, a key assumption of the model. The histogram of residuals shows a roughly symmetric distribution centered around zero, which supports the assumption of normality. The plot of fitted values versus residuals shows no systematic pattern, indicating that the residuals are not correlated with the fitted values. The ACF plot of residuals shows most lags within the confidence bounds, implying no significant autocorrelation in the residuals.
The accuracy measures for the model include an RMSE of 0.804, which is the key metric we use for evaluating model performance. This value indicates the average magnitude of the forecast error and suggests that the model performs reasonably well.
The forecast for the next 12 months predicts the time series values to gradually increase, with the point forecast for February 2025 being approximately 18.26. This gradual increase aligns with the observed upward movement in the historical data. The confidence intervals widen as the forecast horizon increases, reflecting greater uncertainty in longer-term predictions.
In summary, the Holt-Winters model provides reasonable accuracy with an RMSE of 0.804. It predicts a steady upward trend in the time series, which aligns with historical observations. However, since the model has no trend or seasonal adjustment as shown by gamma and beta being 0, it is better suited for a dataset with minimal or non-significant trends and seasonality. The residual analysis confirms that the model assumptions are met, suggesting that the forecasts are reliable within the given context.
adf_test <- adf.test(sales_ts, alternative = "stationary")
print(adf_test)
##
## Augmented Dickey-Fuller Test
##
## data: sales_ts
## Dickey-Fuller = -1.449, Lag order = 2, p-value = 0.7823
## alternative hypothesis: stationary
ndiffs_required <- ndiffs(sales_ts)
print(paste("Number of differences to make stationary:", ndiffs_required))
## [1] "Number of differences to make stationary: 1"
sales_ts_diff <- diff(sales_ts, differences = ndiffs_required)
plot(sales_ts_diff, main = "Differenced Time Series", ylab = "Differenced Sales Units", xlab = "Time")
acf(sales_ts_diff, main = "ACF of Differenced Series")
pacf(sales_ts_diff, main = "PACF of Differenced Series")
auto_arima_model <- auto.arima(sales_ts)
print(summary(auto_arima_model))
## Series: sales_ts
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## -0.4326
## s.e. 0.1464
##
## sigma^2 = 0.3474: log likelihood = -21.85
## AIC=47.7 AICc=48.25 BIC=50.14
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.08479764 0.5662927 0.4260231 0.4472622 2.831649 0.2676625
## ACF1
## Training set -0.1082698
best_model <- auto_arima_model
plot(residuals(best_model), main = "Residuals", ylab = "Residuals", xlab = "Time")
hist(residuals(best_model), main = "Histogram of Residuals", xlab = "Residuals")
plot(fitted(best_model), residuals(best_model), main = "Fitted Values vs. Residuals", xlab = "Fitted Values", ylab = "Residuals")
plot(sales_ts, residuals(best_model), main = "Actual Values vs. Residuals", xlab = "Actual Values", ylab = "Residuals")
acf(residuals(best_model), main = "ACF of Residuals")
pacf(residuals(best_model), main = "PACF of Residuals")
accuracy_metrics <- accuracy(best_model)
print(accuracy_metrics)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.08479764 0.5662927 0.4260231 0.4472622 2.831649 0.2676625
## ACF1
## Training set -0.1082698
forecast_1yr <- forecast(best_model, h = 12)
forecast_2yr <- forecast(best_model, h = 24)
plot(forecast_1yr, main = "1-Year ARIMA Forecast")
plot(forecast_2yr, main = "2-Year ARIMA Forecast")
print(forecast_1yr)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Mar 2024 16.0254 15.27004 16.78077 14.87017 17.18064
## Apr 2024 16.0254 15.15691 16.89390 14.69715 17.35365
## May 2024 16.0254 15.05690 16.99390 14.54421 17.50660
## Jun 2024 16.0254 14.96630 17.08451 14.40564 17.64516
## Jul 2024 16.0254 14.88286 17.16795 14.27803 17.77277
## Aug 2024 16.0254 14.80511 17.24570 14.15912 17.89168
## Sep 2024 16.0254 14.73203 17.31878 14.04735 18.00345
## Oct 2024 16.0254 14.66286 17.38795 13.94157 18.10924
## Nov 2024 16.0254 14.59703 17.45377 13.84090 18.20991
## Dec 2024 16.0254 14.53411 17.51669 13.74467 18.30613
## Jan 2025 16.0254 14.47374 17.57706 13.65234 18.39846
## Feb 2025 16.0254 14.41563 17.63517 13.56347 18.48733
print(forecast_2yr)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Mar 2024 16.0254 15.27004 16.78077 14.87017 17.18064
## Apr 2024 16.0254 15.15691 16.89390 14.69715 17.35365
## May 2024 16.0254 15.05690 16.99390 14.54421 17.50660
## Jun 2024 16.0254 14.96630 17.08451 14.40564 17.64516
## Jul 2024 16.0254 14.88286 17.16795 14.27803 17.77277
## Aug 2024 16.0254 14.80511 17.24570 14.15912 17.89168
## Sep 2024 16.0254 14.73203 17.31878 14.04735 18.00345
## Oct 2024 16.0254 14.66286 17.38795 13.94157 18.10924
## Nov 2024 16.0254 14.59703 17.45377 13.84090 18.20991
## Dec 2024 16.0254 14.53411 17.51669 13.74467 18.30613
## Jan 2025 16.0254 14.47374 17.57706 13.65234 18.39846
## Feb 2025 16.0254 14.41563 17.63517 13.56347 18.48733
## Mar 2025 16.0254 14.35955 17.69126 13.47770 18.57311
## Apr 2025 16.0254 14.30529 17.74551 13.39472 18.65608
## May 2025 16.0254 14.25270 17.79811 13.31428 18.73652
## Jun 2025 16.0254 14.20162 17.84919 13.23616 18.81464
## Jul 2025 16.0254 14.15193 17.89888 13.16017 18.89063
## Aug 2025 16.0254 14.10353 17.94728 13.08615 18.96466
## Sep 2025 16.0254 14.05631 17.99449 13.01394 19.03687
## Oct 2025 16.0254 14.01020 18.04060 12.94342 19.10738
## Nov 2025 16.0254 13.96513 18.08568 12.87449 19.17632
## Dec 2025 16.0254 13.92102 18.12979 12.80702 19.24378
## Jan 2026 16.0254 13.87781 18.17299 12.74095 19.30986
## Feb 2026 16.0254 13.83546 18.21535 12.67617 19.37463
The ARIMA analysis reveals that the time series data is not stationary, as confirmed by the ADF test, which produced a p-value of 0.7823, indicating we cannot reject the null hypothesis of non-stationarity. To make the series stationary, one difference was applied, as suggested by the ndiffs() function. The differenced time series plot shows random fluctuations around zero, confirming it stationary after differencing. No seasonal component is evident in the data based on the differenced series and the ACF/PACF plots. The ACF plot exhibits a significant spike at lag 1 and a rapid drop-off, while the PACF plot shows a significant spike at lag 1 with no further lags. This behavior indicates that ARIMA(0,1,1) is an appropriate model, with ARIMA(1,1,0) being a secondary possibility.
The ARIMA(0,1,1) model was chosen based on its lower AIC (47.7), BIC (50.14), and sigma^2 (0.3474), signifying a better fit than other candidates. The final model formula includes an MA(1) coefficient of -0.4326. Residual analysis confirms the model’s adequacy, with residuals fluctuating randomly around zero in the residual plot and no discernible patterns in the fitted values vs. residuals or actual values vs. residuals plots. The histogram of residuals approximates a normal distribution, and the ACF plot of residuals shows no significant autocorrelations, indicating the model captures dependencies effectively. Five accuracy measures: ME (0.0848), RMSE (0.5663), MAE (0.4260), MPE (0.4473), and MAPE (2.8316)—show the model is reasonably accurate, with a low percentage error.
The one-year forecast predicts stable values ranging around 16.02 to 16.03, with confidence intervals gradually widening. Similarly, the two-year forecast shows consistent predictions of approximately 16.03, with greater uncertainty reflected in the confidence intervals for the second year. Overall, the ARIMA(0,1,1) model is a strong fit for this dataset, accurately capturing trends and providing reliable forecasts for the next one and two years. While the model performs well for this data, its simplicity assumes no seasonality or external factors, which should be considered in future analyses if relevant patterns emerge.
print("Accuracy Measures for Naïve Model:")
## [1] "Accuracy Measures for Naïve Model:"
print(accuracy_naive)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.053 0.6498938 0.51412 0.2460432 3.411387 0.3230122 -0.4356982
print("Accuracy Measures for ETS Model:")
## [1] "Accuracy Measures for ETS Model:"
print(accuracy_ets)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.1008798 0.566215 0.4348502 0.5553676 2.889801 0.2732084
## ACF1
## Training set -0.1227497
print("Accuracy Measures for Holt-Winters Model:")
## [1] "Accuracy Measures for Holt-Winters Model:"
print(accuracy_hw)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.005262362 0.8045032 0.5856619 0.004108308 3.699873 0.3679606
## ACF1
## Training set -0.07687174
print("Accuracy Measures for ARIMA Model:")
## [1] "Accuracy Measures for ARIMA Model:"
print(accuracy_metrics)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.08479764 0.5662927 0.4260231 0.4472622 2.831649 0.2676625
## ACF1
## Training set -0.1082698
Naive Model: The naive method assumes that the forecast for the next period is equal to the last observed value. This method is particularly useful for datasets with random walk or highly volatile patterns without trends or seasonality. It is simple to implement and serves as a baseline for evaluating the performance of more complex models.
ETS Model: This method decomposes a time series into error, trend, and seasonal components. It optimally selects the appropriate components based on the data. Ideal for data with clear seasonality and trend components, as it adjusts to underlying patterns over time.
Holt-Winters Model: A type of exponential smoothing that accounts for both trends and seasonality, using alpha, beta, and gamma for smoothing parameters for level, trend, and seasonality.Effective for time series data with consistent seasonal patterns, allowing for both short-term and long-term forecasting.
ARIMA Model: Auto-Regressive Integrated Moving Average is a sophisticated method combining differencing, autoregression, and moving average components to model the data’s structure. Excellent for non-seasonal data with trends or patterns that require differencing to achieve stationary, and it adjusts for lags and dependencies in the data.
Best and Worst Forecast Methods Based on Each Accuracy Measures:
ME: Best: Holt-Winters Model (0.0053) Worst: ETS Model (0.1009) Holt-Winters has the smallest average error, making it reliable for unbiased forecasts.
RMSE*: Best: ETS Model (0.5662) Worst: Naive Model (0.6499) ETS outperforms others in minimizing error magnitude, making it the best for predicting future values.
MAE: Best: ARIMA Model (0.4260) Worst: Holt-Winters Model (0.5857) ARIMA’s low MAE reflects its strong ability to predict accurate values with minimal average deviation.
MPE: Best: Holt-Winters Model (0.0041) Worst: ETS Model (0.5554) Holt-Winters demonstrates minimal percentage bias, suitable for unbiased forecasting.
MAPE: Best: ARIMA Model (2.8316) Worst: Naive Model (3.4114) ARIMA provides forecasts with the lowest relative percentage errors, making it highly reliable.
MASE: Best: ARIMA Model (0.2677) Worst: Holt-Winters Model (0.3679) ARIMA’s scaled errors are the lowest, highlighting its efficiency relative to naive benchmarks.
Conclusion:
The time series analysis indicates a consistent upward trend over the observed period, with periodic fluctuations suggesting underlying seasonality. The forecasts predict that the time series value will increase gradually over the next year, continuing the trend observed in the historical data. Over the next two years, the value is expected to maintain this increasing pattern, albeit with greater uncertainty as reflected in the widening forecast intervals.
The ranking of forecasting methods for this time series is:
ETS Model – Demonstrated the highest accuracy, with the lowest RMSE, effectively capturing both trend and seasonal components.
ARIMA Model – Performed well with a low RMSE, making it a reliable option for capturing trends, though less effective than ETS in addressing seasonality.
Holt-Winters Model – Provided reasonable accuracy but had a slightly higher RMSE compared to ETS and ARIMA, possibly due to limitations in capturing complex patterns.
Naive Model – The least accurate method, with higher error measures, serving only as a simple baseline.
In summary, the time series is on a moderate growth trajectory, and the ETS model is the most effective method for forecasting this data, followed closely by ARIMA. These models provide reliable insights for anticipating future trends.